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Abstract. The spatial preferred attachment (SPA) model is a model for net- 
worked information spaces such as domains of the World Wide Web, citation 
graphs, and on-line social networks. It uses a metric space to model the hidden 
attributes of the vertices. Thus, vertices are elements of a metric space, and link 
formation depends on the metric distance between vertices. We show, through 
theoretical analysis and simulation, that for graphs formed according to the 
SPA model it is possible to infer the metric distance between vertices from the 
link structure of the graph. Precisely, the estimate is based on the number of 
common neighbours of a pair of vertices, a measure known as co-citation. To be 
able to calculate this estimate, we derive a precise relation between the number 
of common neighbours and metric distance. We also analyze the distribution of 
edge lengths, where the length of an edge is the metric distance between its end 
points. We show that this distribution has three different regimes, and that the 
tail of this distribution follows a power law. 



1. Introduction 

Thanks to the World Wide Web and its hyperhnked structure, more and more 
information is becoming available in the form of a networked information space: 
a collection of information entities (documents, scientific papers, Web pages, in- 
dividuals in a social network), connected by links between pairs of entities (refer- 
ences, citations, hyperlinks, "friend" relationships). Studies of various networked 
information spaces have given convincing evidence that a significant amount of 
information about the entities represented by the vertex can be derived from the 
graph representing the link structure. This has led to the application of graph- 
theoretical techniques to such graphs, with the aim of developing methods to 
understand the link structure and mine its information. 

An important step in understanding the link structure is the development of 
a graph model; a stochastic process that models the link formation. The first 
generation of graph models was mainly aimed at explaining the graph-theoretical 
properties observed in real-life networks. In such models, vertices are considered 
anonymous, and link formation is only influenced by the current link structure. An 
example is the seminal model by Barabasi and Albert in fS] based on the principle 
of preferential attachment: each new vertex attaches randomly to a prescribed 
number of existing vertices, with a link probability proportional to the degree, so 
vertices of high degree are more likely to receive a link from the new vertex. 

Key words and phrases. Node similarity, co-citation, bibliographic coupling, link analysis, 
complex networks, spatial graph model, SPA model. 
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In networked information spaces, vertices are not only defined by their link 
environment, but also by the information entity they represent. More recently, 
attempts have been made to model this alternative view of the vertices through 
spatial models. In a spatial model, vertices are embedded in a metric space, and 
link formation is influenced by the metric distance between vertices. The metric 
space is meant to be like a feature space, so that the coordinates of a vertex in this 
space represent the information associated with the vertex. For example, in text 
mining, documents are commonly represented as vectors in a word space. The 
metric is chosen so that metric distance represents similarity, i.e. vertices whose 
information entities are closely related will be at a short distance from each other 
in the metric space. 

In this paper, we focus on the Spatial Preferred Attachment (SPA) model, pro- 
posed in [1], and analyze the relationship between the link structure of graphs 
produced by the model, and the relative positions of the vertices in the metric 
space. The SPA model generates directed graphs according to the following prin- 
ciple. Vertices are points in a given metric space. Each vertex v has a sphere of 
influence. The volume of the sphere of influence of a vertex is a function of its 
in-degree. A new vertex u can only link to an existing vertex f if m falls inside the 
sphere of influence of v. In the latter case, u links to v with probability p. The 
SPA model incorporates the principle of preferential attachment, since vertices 
with a higher in-degree will have a larger sphere of influence. A model for on-line 
social networks based on similar principles can be found in [11|5]. 

A number of spatial models have been proposed recently [H [TOl [TTl [191 [12] • In 
these models, as in the SPA model, the relationship between spatial distance and 
link formation is determined by a threshold function: a link is possible if vertices 
are within a prescribed threshold distance of each other, and impossible otherwise. 
However, for these models the threshold distance remains constant throughout the 
process, and does not depend on the degree, and decrease with time, as in the SPA 
model. 

A different class of graphs explores the interplay between distance and edge 
likelihood — with associated graph properties — with more involved mechanisms 
than simple thresholds: for example, in |23], each new vertex is born with m 
edges, each joining a neighbour with probability proportional to the in-degrees 
and a function of the distance between them. Variations include the deterministic 
model PI] in which edges are formed based on the "utility" for the nodes in ques- 
tion, utility incorporating both in-degree and distance; in [S], the model demands 
that the number of nodes per unit volume is constant, and an analysis on the dis- 
tribution of edge lengths is also included. Beyond the creation of models, [17J takes 
a closer look at the concept of complex networks having an underlying geometry. 
For a recent survey of spatial models, see [H] . 

Our first main result shows that, for the SPA model, the number of common in- 
neighbours between a pair of vertices can, in many cases, be used to estimate the 
distance between the vertices. Since the metric distance is assumed to represent 
the similarity or "closeness" of the entities represented by the vertices, this means 
that it is possible to estimate similarity between vertices by looking at the graph 
only, i.e. without considering the underlying reality represented by the metric 
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space. The number of common in-neighbours in a citation graph is known in 
hbrary science as the measure of co-citation, and is one the earhest measures 
of graph-based similarity, proposed by Small in 1973 in [22]. Co-citation, and 
the related measure of bibliographic coupling (from [15]) based on the number 
of common out-neighbours, are widely used link similarity measures for scientific 
papers, via the citation graph, for Web pages, and others [H [H [20l [18] . 

The question of determining similarity between vertices is one that is central 
to many link mining applications. It is an important tool in searching, by finding 
documents or Web pages that are similar to a given target document. It can also 
be used as the basis to identify communities, or clusters, of similar vertices. A 
purely graph-based measure of similarity can be used as a complementary indi- 
cation of similarity between vertices when other information is unreliable (as is 
often the case in the World Wide Web), largely unavailable (as in some biological 
networks and online social networks), or protected by privacy laws (as in networks 
representing phone calls or bank transactions). 

Our result on the relationship between number of common neighbours and met- 
ric distance is derived theoretically through an analysis of the SPA model. The 
analytic result is asymptotic in the size of the graph. In order to test the result 
on realistic graph sizes, we performed simulations for graphs of 100,000 vertices, 
with various parameter choices. The simulations show that the real distance and 
the predicted distance from the number of common neighbours are in very good 
agreement. 

Our second main result determines the distribution of the edge lengths, where 
the length of an edge is the metric distance between its end points. Edge length 
is a metric property of a graph feature, and edge length distribution is a com- 
bined metric/graph property which is unique to spatial graph models. In the SPA 
model, the maximum length of an edge is determined by the size of the sphere 
of influence of its destination vertex, and this size is determined by the degree of 
the vertex. Since the degrees follow a power law, we might expect that the edge 
length distribution follows a power law. We show, both through theoretical results 
and simulations, that the situation is slightly more complex. In fact, we present 
clear evidence that, for a certain combination of model parameters, there are three 
different regimes of the distribution. For the smallest edge lengths, the cumulative 
edge length distribution is constant: almost all edges fall in this category. In the 
mid range, we have a power law with coefficient between and 1, and in the tail, 
we have a power law with exponent greater than 1. 

In Section 2, we describe the SPA model and derive some properties on the de- 
gree of a vertex which we will need to establish our results. In Section 3, we give 
the result on common in-neighbours and metric distance, and present the simula- 
tion results. In Section 4, we state our theorem on edge length distribution, and 
present the edge length distribution as obtained through simulations for various 
parameters. In Section 5, we give proofs of all the main theorems. 
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2. The SPA model 

We start by giving a precise description of the SPA model, and deriving some 
facts about the degrees of the vertices, which we will need to prove our main 
results. In [1], the model is defined for a variety of metric spaces S. In this paper, 
we let S be the unit hypercube in M™, equipped with the torus metric derived 
from any of the Lp norms. This means that for any two points x and y in S, 

d(x, y) = min { | |x — ?/ + m| |p : u E { — 1,0, 1}™}. 

The torus metric thus "wraps around" the boundaries of the unit square; this 
metric was chosen to eliminate boundary effects. Let Cm be the constant of pro- 
portionality of volume used with the m-th power of the radius in m dimensions, 
so the volume of a ball of radius r in m-dimensional space with the given metric 
equals Cmf"^- For example, for the Euclidean metric, C2 = tt, and for the product 
metric derived from L^o, Cm = 2"^. 

The parameters of the model consist of the link probability p G [0, 1], and two 
positive constants Ai and A2. The SPA model generates stochastic sequences of 
graphs {Gt '■ t > 0), where Gt = (yt,Et), and C 5*. Let deg~{v,t) be the 
in-degree of vertex v in Gt, and deg^{v,t) its out-degree. We define the sphere of 
influence S{v,t) of vertex v at time t > 1 to be the ball centered at v with volume 
[^(f,^)! defined as follows: 

15^(^,^)1 = ^ , (1) 

or S{v,t) = S and \S{v,t) \ = 1 if the right-hand-side of Q is greater than 1. 

The process begins at t = 0, with Go being the null graph. Time-step t, t > 1, 
is defined to be the transition between Gt~i and Gt- At the beginning of each 
time-step t, a new vertex vt is chosen uniformly at random from S, and added 
to Vt-i to create Vf. Next, independently, for each vertex u G Vt-i such that 
Vt e S{u,t — 1), a directed link {vt,u) is created with probability p. Thus, the 
probability that a link {vt, u) is added in time-step t equals p \ S{u, t — 1)|. 

We note that, to avoid the resulting graph becoming too dense, the parameters 
must be chosen so that pAi < 1, as explained in p. In this paper, we assume that 
the parameters meet this condition. Also, the original model as presented in [1] 
has a third parameter, A3, which is assumed to be zero here. This causes no loss 
of generahty, since all asymptotic results presented here are unaffected by A3. 

We now introduce some more definitions. In the rest of the paper, unless oth- 
erwise stated we will assume all asymptotics to refer to n going to infinity, where 
n is the end time of the growth process, and thus the final size of the graph. 
(As explained above. Theorem 2.1 is an exception.) We say that an event holds 
asymptotically almost surely (a.a.s.) if the probability that it holds tends to one 
as n goes to infinity. Similarly, we will use with extreme probability (w.e.p.) if the 
event holds with probability at least 1 — exp(— 9(log^ n)). Thus, if we consider a 
polynomial number of events that each holds w.e.p., then w.e.p. all events hold. 

It was shown in [T] that the SPA model produces graphs with a power law 
degree distribution, with exponent 1 + l/{j)Ai). In [7], the (directed) diameter 
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of the model was investigated. For the results of this paper, we need a precise 
expression for the expected in-degree of each vertex. 

Theorem 2.1. Let u = uj{t) be any function tending to infinity together with t. 
The expected in-degree at time t of a vertex Vi horn at time i > u is given by 

E(deg-Kt)) = (l + o(l))-^y (2) 

Proof. In order to simplify calculations, we make the following substitution: 

X{vi,t) = deg-{vi,t) + ^. (3) 

It follows immediately from the definition of the process that 

i + 1) = |^(^- ^) + 1' probability 
\X{vi,t), otherwise. 

Therefore, 

E(XK,t + 1) I X{v.,t)) = {X{v,,t) + 1)^^^^^+X{v.,t) (l - P^l^^ 



X(^.,t)(l + ^ 



t 



and so 



E{Xivu t+l)) = E{X{v,, t)) (^1 + 



Since all vertices start with in-degree zero, X{vi,i) = Since i > one can 
use this to get 

t-i 



1 \ 3 / 



t-i 



= (1 + o(l))^exp l^pAilog 

and the assertion follows from (|3]). □ 

Theorem 2A_ states that the expected in-degree of an individual vertex born 
at time i is asymptotically equal to ^ {\Y^^ ~ ^5 with an error term of order 
o{(t / iy^^) . (The asymptotics assume that t is going to infinity, and z is a growing 
function of t.) However, the in-degree of an individual vertex is not concentrated 
around its expected value. This is due to variation happening shortly after birth; 
whether or not the vertex receives in-links in the first few time steps after its 
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birth greatly affects the size of its sphere of influence throughout the process, and 
therefore its final in-degree. 

We can circumvent this difficulty by considering the final in-degree of the vertex, 
and infer the growth history of the in-degree from there. Namely, from the in- 
degree of the vertex at end time n, we can obtain sharp bounds on the in-degree of 
the vertex during most of the process. This is expressed in the following theorem. 
First, define a injective function / : M — )■ M by 

so f{i) is the expected in-degree, at time n, of a vertex born at time i (up to a 
multiplicative factor of (1 -|- o(l))). Thus f~^{k) is the birth time of a vertex of 
final in-degree k, had the in-degree of the vertex remained close to its expected 
value during its entire lifetime. Moreover, the (asymptotic) expected in-degree 
at time t of a vertex born at time i can be given as {A2/Ai)f{i)/f{t) (provided 
that i = i{n) tends to infinity with n). Thus, if a vertex of final in-degree k has 
in-degree growth close to its expected value, then 



A^k 
a 



t 

will be the approximate time when that vertex has in-degree a. The precise 



statement and proof of this discussion follows below in Theorem 2.2, the main 
result of this section. 

Theorem 2.2. Let uj = uj{n) be any function tending to infinity together with n. 
The following statement holds a.a.s. for every vertex v for which deg~(f , n) = k = 
k{n) > ulogn. Let i = f~^{k), and let 



tk = f 



-1 



Aok 



Aiulogn 

Then, for all values of t such that tk <t < n, 



deg-(.,*) = (l+o(l))^^.j = (1+0(1))-^.^ = (l+o(l)H. y .(4) 

The theorem implies that once a given vertex accumulates lo log n in- neighbours, 
the rest of the process (until time-step n) can be predicted with high probability; 
in fact, a.a.s. we get a concentration around the expected value. Let us mention 
that it seems that the w factor is needed to get a concentration result. However, 
without this factor, the order of the in-degrees still can be predicted: once the 
vertex has log n in-neighbours, we can bound the in-degree of this vertex so that 
the ratio between upper and lower bounds would a.a.s. be a constant. 



In order to prove Theorem 2.2, we need strong results on the concentration of 
the in-degree throughout the process. These results, and the proof of the theorem, 
are given in Section |5] 
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3. Number of common neighbours and spatial distance 

The principles of the SPA model make it plausible that vertices that are close to- 
gether in space will have a relatively high number of common neighbours. Namely, 
if two vertices are close together, their spheres of influence will overlap during most 
of the process, and any new vertex falling in the intersection of both spheres has 
the potential to become a common neighbour. Thus, the number of common 
neighbours (co-citation) should lead to a reliable measure of closeness in the met- 
ric space. In this section, we will quantify the relation between spatial distance 
and number of common in-neighbours, and show how it can be used to estimate 
distance. 

The term "common neighbour" here refers to common in-neighbours. Precisely, 
a vertex w is a common neighbour of vertices u and v if there exist directed links 
from w to u and from w to v. Note that in our model this can only occur if w is 
younger than u and v, and, at its birth, w lies in the intersection of the spheres 
of influence of u and v. We use cn{u,v,t) to denote the number of common 
in-neighbours of u and v at time t. 



Theorem 3.1 distinguishes three cases. The division into cases is based on the 



trend, as shown in Theorem 2.2, that spheres of influence tend to shrink over time. 
Thus, once the spheres of influence of two vertices have become disjoint, and their 
boundaries have some distance between them, it is not likely that they will overlap 
at any time after that. The cases therefore are distinguished by how the spheres of 
influence of u and v overlap, and when or whether they become disjoint. Figure [T] 
gives a pictorial representation of the three cases. Consider two vertices u and v 
so that V has smaller in-degree at time n than u. Thus, the sphere of influence of 
V tends to be smaller than that of u, and the likely birth time of u is before that 
of V. 

In Case 1, u and v are so far apart that their spheres of influence never overlap, 
except maybe for a negligible initial time period near their birth. In this case, no 
vertex can fall in the spheres of influence of both u and v, and thus u and v will 
acquire no common neighbours after the initial time period. Thus, they will have 
negligibly few common neighbours. In this case again, accurate prediction of the 
spatial distance between u and v is not possible: if u and v have very few common 
neighbours, we can only give a lower bound on their distance. 

In Case 2, u and v are so close that the sphere of influence of v is contained 
within the sphere of influence of u for almost all of its existence. In this case, the 
number of common neighbours of u and f is a constant proportion of the degree 
of V, due to the fact that each new vertex linking to v will automatically be within 
the sphere of influence of u, and thus can link to u as well (and does so with 
probability p.) This means that u and v are too close for accurate prediction: if 
cn{u,v,n) and deg~(f,n) differ by a factor close to p we can only give an upper 
bound on the spatial distance between u and v. 

In Case 3, the sphere of influence of v is contained in that of u near the birth 
of V, but the spheres become disjoint before the end of the process. The moment 
at which the separation occurs can be determined fairly precisely, and depends 
heavily on the distance between u and v. Thus, for this case we have a formula 
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Case Near 

birth of V 



Near end 
of process 




u 



Too far 



u 




u 



© 
• V 



Too close 




u 



Just right 



Figure 1. The three cases of Theorem I3J^ 



for the number of common neighbours which involves the distance between u and 
V, and the in-degree of both u and v at the end of the process. Reversing the 
formula, we can obtain a reliable estimate for the distance between u and v from 
the observable graph properties cn{u,v,n), deg~(M) and deg~(f). 

Theorem 3.1. Let u = uj{n) be any function tending to infinity together with n. 
The following holds a.a.s. Let and ve be vertices such that 

k = deg(ffc,n) > deg(f^,n) = i > w^logn 

in a graph generated by the SPA model. Let d = d{vk,ve) be the distance between 
Vk and Ve in the metric space. Finally, let T = f^^{£/{ulogn)). Then, 

Case 1. If d > e{uj\ogn/TY^'^ for some e > 0, then 

cn{ve,Vk,n) = 0(a;log?7.). 

Case 2. If k > {1 + e)i for some e > and 

'A,k + A2Y^"' fA^£ + A2^^^'^ 



d< 



CfinTl 



Crn.Tt 




(5) 



then 

cn{vt>, Vk, n) = {1 + o{l))p£. 

If k = {1 + o{l))e and d < (A;/n)i/™ = 
cn{ve,Vk,n) = (1 + o{l))pi as well. 



[1 + o(l))(£/n)i/™, then 
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Case 3. If k > {1 + e)i for some e > and 



CyyiTl 



< < (wlogn/T) 



l/m 



then 




(6) 



(7) 



where ik = f ^{k) and ie = f ^(^) and C = pA^ A^ ^ 

Ifk = (1 + o(l))£ and eiklnfl'^ < c/ < (wlogn/r)^/™ for some e > 0, 
then 



The importance of the theorem is that ([T]) gives a relationship between the 
distance between the vertices, their number of common neighbours, and their 
degrees. Since the number of common neighbours and the degrees are observable 
from the graph, the equation allows us to obtain an estimate for the (spatial) 
distance between the vertices using only basic graph parameters. 

We tested the predictive power of our theoretical results on data obtained from 
simulations. The data was obtained from a graph with 100,000 vertices. The graph 
was generated from points randomly distributed in the unit square in according 
to the SPA model described in Section 2, with n = 100, 000 and p = 0.95, and 
Ai = A2 = 1. 

First of all, we show that a blind approach to using the co-citation measure 
(number of common neighbours) does not work. In Figure |2] we plot spatial 
distance versus number of common neighbours without further processing. No 
relation between the two is apparent. 



dist_commneigh 0.95 



Figure 2. Actual distance vs. number of common neighbours. 



Next, we apply Theorem 3A_ to estimate the spatial distance between two ver- 
tices, based on the number of common neighbours of the pair. (The spatial dis- 
tance is actual distance between the point in the metric space, which for our 
simulation is the distance obtained from the Euclidean torus metric on the unit 
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square.) From Cases 1 and 2, we can only obtain a lower and upper bound on 
the distance, respectively. In order to eliminate Case 1 (too far), we consider only 
pairs that have at least 20 common neighbours. This reduces the data to 19,200 
pairs. For pairs of vertices in Case 2 (too close), the number of common neigh- 
bours equals p times the lowest degree of the pair. In order to eliminate this case, 
we require that the number of common neighbours should be less than p/2 times 
the lowest degree of the pair. This reduces the data set to 2,400 pairs. We expect 
these pairs mainly to be in Case 3. 

For pairs in Case 3, we can derive an estimate of the distance. Consider two 
such vertices V£ and Wfc, with final in-degree i and k, respectively. We base our 
estimate on Equation 7l where we ignore the multiphcative (1 + 0((^)p^i/'")) 
error term. Namely, wnen k and i are of the same order, then this expression is 
the average of the lower and upper bound as derived in the proof of the theorem, 
and when i k the term is asymptotically negligible. The estimated distance d 
between nodes ve and Vk, given that their number of common neighbours equals 
A^, is then given by 



where ik = f-\k) and = f-^i) and C = {p / A^)^^ A:;"^^ CnT . 

Figure [3] shows actual vs. estimated distance for these pairs. The estimated 
distance (on the y-axis), is computed using only data obtainable from the graph: 
the in-degrees of both vertices, and their number of common neighbours. This is 
compared to the actual distance (on the x-axis), known from the simulation. We 
see almost perfect agreement between estimate and reality. 




Figure 3. Actual distance (a;-axis) vs. estimated distance (y-axis) 
for eligible pairs from simulated data, calculated using the in-degree 
of both vertices. 

The figure shows that the scatter away from the diagonal is confined to points 
below the diagonal. This means that, for the corresponding pairs, the estimate d 
is lower than the actual distance. This is due to the choice to base our estimate on 
the average between the lower bound obtained from t~, the estimated time when 
the sphere of influence of first touches the boundary of the sphere of influence 
of Vki and the upper bound derived from t"*", when the spheres of influence of 
and Vk first become disjoint. 
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The probability that a neighbour of vi born between t~ and becomes a 
common neighbour of Vk and vi depends on the fraction of the sphere of influence 
of Vi which lies inside the sphere of influence of Vk- If the curvature of the sphere 
of influence of Vk is negligible so that the boundary locally resembles a line, and 
if the sphere of influence of vi remains constant in size from to t^, then the 
average is a good estimate. However, both assumptions are notably false: the 
curvatures of the spheres of influence of an Vk may well be of the same order, 
and the spheres of influence both shrink during the process. This implies that 
the fraction of the sphere of influence of vi inside the sphere of influence of Vk is 
smaller than assumed near time t"*", and larger than assumed near t~ . Thus, the 
true expected number of common neighbours will likely be larger than indicated 
by the average. This leads to an underestimate of the distance (more common 
neighbours is interpreted as closer distance). 

In order to test our interpretation of the error in the estimation, we based the 
estimator d on a convex combination of the lower bound L on the numbers of 
common neighbours of vertices Vk and vi given by L = pdeg~(f£,t~) and the 
upper bound U = pdeg{ve,t~^). So the expected number of common neighbours is 
assumed to be (1 — c)L + cU, which gives an expression involving d. Solving for d 
gives our estimator d. We found that the best value of c occurred when c = 0.005, 
which means that the lower bound based on time t~ gives the best indication of 
the true number of common neighbours. 

The results for this adjusted estimator are given in Figure |4j As we can see, the 
estimator is still not perfect; we conjecture that this is because the value of c that 
gives the best estimate is not uniform over all pairs, but depends on the relative 
sizes of the spheres of influence of the pair in the critical time interval. 




0.O5 0.1 0.15 0.2 0.25 3 35 0.4 



Figure 4. Actual distance (a;-axis) vs. estimated distance (y-axis) 
for eligible pairs from simulated data, using the adjusted estimator. 

4. Edge length distribution 

In this section we derive the edge length distribution; that is, the number of 
edges whose length is at least a given value x. The length of an edge is the (metric) 
distance between its endpoints. The edge distribution is a characteristic of spatial 
models. It will influence a number of graph properties, especially the diameter and 
the expansion properties. Long edges, even if they are rare, give the opportunity 
to jump to another locality in the metric space. It has been shown before (see. 
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for example, (TB]) that a small number of long edges can reduce the average path 
length between vertices by a large factor. 

In the SPA model, the degree distribution follows a power law, and the volume 
of the spheres of influence is proportional to the degree of a vertex. The radius 
of the sphere of influence determines the limit of the length of an edge to that 
vertex. Thus, we expect the edge lengths to follow a power law as well. These 
considerations lead us to consider all edges whose length exceeds a given value 



n 



(Recall that Cm is the volume of an m-dimensional ball of unit radius.) Namely, 
in this case we can limit our focus to those vertices whose sphere of influence has 
volume at least n~". 

Fix a > 0. An edge {v,w) G E{G) will be called a long edge if the edge length 
d{v,w) > Ta- We will study the random variable e(a), the number of long edges 
in the graph. Formally, 

e{a) = <{v,w)gE: d{v,w) > 

Theorem 4.1. In the SPA Model with 1/2 < pAi < 1, a.a.s. the number of long 
edges is given by 

f(l + o(l))^^n, %fa>l 
e(a) = < ^ 1 , i-pAi , (8) 

\(l+o(l))Cn=-5^"W, j/l-JA_<a<l. 



where 
C = 



By [Ij, the total number of edges in graphs generated by the SPA model equals 
(1 + o(l)) j^^^n. Thus, the first case of the theorem states that for a > 1, e{a) is 
approximately equal to the total number of edges. To see why this is so, consider 
that, as a increases, the threshold for an edge to be classified as "long", namely 
Tq,, decreases. If a > 1, then Va is so small that almost all edges are long. 

The next range for a, 1 — ^^^^^ < a < 1, shows a linear relationship between 
loge(a) and logfo,. Namely, mlogrQ, = (1 + o(l))(— a) logn, and thus for this 
range, 

loge(a) = (l + o(l))(^2--l- + a^^^^logn (9) 

(1 1 — pAi \ 

{2- —)\ogn-m logTaj . 

Since 1/2 < pAi < 1, the slope of the line giving the relationship between a and 
loge(a) lies between and 1. 
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The theorem does not include a claim about the tail of the edge distribution, 
when a becomes small, and thus becomes relatively large. When 1 — "pAi < 
a < 1 — 4p^'^2 ' main contribution to e{a) comes from vertices that have very 
high final degree (not moderately high, as before) and the long edges are created 
till the very end of the process. Unfortunately, the number of vertices of very high 
degree cannot be precisely controlled ; from [T] we only have upper bounds and 
lower bounds on the maximum degree that hold w.e.p. which differ by a factor of 
log^n. Therefore, it seems unlikely that e{a) is concentrated in this case. 

When a < 1 —pAi, a.a.s. long edges cannot be created at the end of the process 
but only until time s = 72"/(i~p^i)+°(i). The main contribution to the number of 
long edges comes from those vertices that have very high degree at time s (and 
have very high final degrees, of course). By a similar argument as given above, 
the number of such vertices, and thus the value of e(a), is not likely to be highly 
concentrated. 

A different problem occurs when pAi < 1/2. The main contribution in this 
case comes from vertices born at time 0(n") and the long edges must have been 
created when these vertices were still young, and had small degrees. Unfortunately, 
the behaviour of the random variable representing the degree of a vertex is not 
concentrated until the degree is ulogn. We expect Q^n") such edges but we 
cannot control the behaviour of these vertices until the degree is large enough. 

The following theorem fills in the missing case when a is small. However, the 
results only apply to the expected value of e{a), and they give broad results about 
the order of the exponent, instead of the finer results of the previous theorem. 
The proofs of both theorems can be found in the last section of the paper. 

Theorem 4.2. For the SPA model, the logarithmic behaviour of the expected value 
of e{a) is as follows. 



For 1/2 < pAi < 1, 

logE(e(a)) ^ 
logn 

ForpAi < 1/2, 



l + o(l) ifa>l, 

2-^ + «^ + «(l), ^fl-pA^<a<l, 

1^ + 0(1), ^fO<a<l-pA^. 



logE(e(a)) _ J 1 + o(l) if a > 1, 
logn )a + o{l), ifO<a<l. 

Thus, for the case where pAi > 1/2, the middle range of e(a) extends beyond 
the lower bound on a for which precise results for e{a) can be obtained, and 
there is a third range for small a, namely a < 1 — pAi, for which the expected 
relationship between log e{a) and a is given by 

/ pAi \ mpAi 
loge(a) = (1 + 0(1)) \ _ I alogn = - ^ logr». (10) 
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Thus we have clear power law behaviour at the tail of the distribution, with 
coefficient > 1. 

l-pAi 

To verify our intuition that the real behaviour of the SPA model is similar to 
the asymptotic results given by the theorems, we ran simulations. We generated 
graphs of 100,000 nodes, in S of dimension m = 2, for various values of p. Ai 
and A2 were both set to 1. The results are seen in Figures |5] and [6} where the 
logarithm of the number of long edges has been plotted against a range of values 
for a. The straight lines in the figures represent the expected behaviour for the 
three ranges of a as given by ^ and ( 10 ) (and a horizontal line for the behaviour 
for large a). To show the fact that the number of long edges decreases as the 
threshold increases, the x-axis gives the values of —a. 



Figure 5. Long Edges Simulation vs. Theory, SPA Model. Parameters: 
n = 100, 000, Ai = A2 = I, m = 2, p = 0.7 (left) and p = 0.8 (right). 



Figure shows two values in the range 1/2 < pAi < 1. For both cases, the 



theoretical results expressed in Equations |9] and [10] give a good approximation of 
the envelope of the curve represented by the simulated values. Not surprisingly, 
near the threshold 1 — pAi = a, the simulated version shows smooth behaviour 
that is a blend between the behaviour on both sides of the range. The angle of 
the tail of the distribution has good agreement with the value predicted from the 
modified model. 



Figure 6. Long Edges Simulation vs. Theory, SPA Model. Parameters: 
n = 100, 000, Ai = A2 = I, m = 2, p = 0.3 (left), and p = 0.5 (right) 
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In Figure [6] we give a simulation result for the case where pAi < 1/2. Here the 
modified model predicts only two regimes, which is borne out by the simulation 
data. We also include a picture for the case pAi = 1/2. At this cross-over value, 
no linear relationship between loge(a) and a can be observed from the picture. 
However, our theoretical results predict that for larger values of n, the curve should 
approach a straight line with slope —a. 

5. Proofs of the main theorems 
5.1. Degree of a vertex. The first part of this section is devoted the proof of 



Theorem 2.2 We will be using the following version of a well-known Bernstein 



inequalities many times so let us state it explicitly. 

Lemma 5.1 ([13j). Let X be a random variable that can be expressed as a sum 
X = Yll=i -^i independent random indicator variables where Xi G Be(pj) with 
(possibly) different pi = P(Xj = 1) = EXj. Then the following holds for t > 0: 



P(X >EX + t) < exp 

P(X <EX -t) < exp 
In particular, if e < 3/2, then 



2{EX + 1/3) 
t^ 



2EX 



P(|X-EX| > eEX) < 2exp ( ^j. (11) 

Now, we are ready to prove the following key observation. 

Theorem 5.2. Suppose that deg~{v,T) = d > ulogn, where u = u{n) is any 
function tending to infinity together with n. Then, for every value oft,T<t< 
2T, we get that 

deg~{v, t) ~ d 



t 



< —— ■ -^d\ogn 
pAi T 



with probability 1 — 0{n ^/^). 



Proof. Let u = u{n) be any function tending to infinity together with n. Suppose 
that deg~(f,T) = d > ulogn. We will show that the upper bound holds; the 
lower bound can be obtained by using an analogous symmetric argument. 
Let us introduce the following stopping time 

To = mml^t>T:deg-{v,t)>d-(^^^ +^^-fVdlogn V t = 2T + 1 

A stopping time is any random variable Tq with values in {0, 1, . . . } U {oo} such 
that it can be determined whether Tq = t* for any time t* from knowledge of the 
process up to and including time t*. The name can be misleading, since a process 
does not stop when it reaches a stopping time. Here, Tq determines the first time 
the process does not exhibit the bounded behaviour we wish to establish. The 
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condition t = 2T + 1 has been added to assure that the set is never empty, and 
thus To is well-defined. If Tq = 2T + 1, then the in-degree of v remained bounded 
as given during the entire time interval T < t < 2T. In order to prove the bound, 
we need to show that with probability 1 — 0{n^^^^) we have Tq = 2T + 1. 

Suppose that Tq < 2T. Note that for t > T up to and including time-step 
To — 1, the random variable deg~{v,t) is (deterministically) bounded from above, 
and so the number of new neighbours accumulated during this phase of the process, 
deg~(t',To) — deg~(f,T), can be (stochastically) bounded from above by the sum 
X = YIJIt^ -^t of independent indicator random variables Xt with 

A, (d {^Y^' + 
F{Xt = l)=p ^ 



pAi 



Vdlogn] + A 



t 



Hence, 



Edeg"(w,To) < d + EX 



To-l 
t=T 




To-T 
T 



2^/d\ogn + 0{l) 



+ ^^^2v/c/logn + 0(l). 



This implies that 
deg-(t;,To)-Edeg-(t;,To) > 



J_ To 
Ml ■ T 



a/cHo^ 



n 



To-T 
T 



2^/d\ogn-0{l) 



> 2^/dh^ 



n, 



and it follows from the bound (11) that 



P(|X - EX| > 2^/dl^) < 2exp e h^^MF^ ^ 

where e = 2\/d logn/EX. Since the maximum value of EX corresponds to Tq = 
2T, it follows that EX < d{2P^^ - 1)(1 + o(l)) < d, and so e > 2^d-^\ogn. 
Therefore, the probability that To < 2T is at most 2 exp(— | \ogn) and the theorem 



is finished. 



□ 



Now, with Theorem 5.2 in hand we can easily get Theorem 2.2 For a given ver- 



tex V of degree uj log n at time T we obtain from Theorem 5.2 that, with probability 
1 -0(n-4/3), 



d 



tJ 



1 



^Vrf-Mogn^ < deg-(T;,t) < d 



Ml --y-- — "v^y v'^MT^^'^^^^, 

for T < t < 2T. We can now keep applying the same theorem for times 2T, 4T, 
8T, 16T, . . . , using the final value as the initial one for the next period, to get 
the statement for all values of t from T up to and including time n. Since we 
apply the theorem O(logn) times (for a given vertex f), the statement holds with 
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probability 1 — o{n~^) and so a.a.s. the statement we are about to prove will hold 
for all vertices. 

It remains to make sure that the accumulated multiplicative error is still only 
(1 + o(l)). After applying the theorem recursively i times the degree is shown 
to be d2P^i^(l + o(l)). Using this rough estimate, and assuming the theorem is 
applied for a total of A; = O(logn) times, we get that the error term is, in fact, 
bounded from above by 

n + ^Vrf-12-P^iqogn^ = (l + o(l))exp ^-^V'd^M^^2-P^i*/2 

= {1 + o{l)) exp (0{^/d^H^] 
= 1 + 0(1), 

since d grows faster than logra. A symmetric argument can be used to show a 
lower bound for the error term and so Theorem 12.21 holds. 



5.2. Number of common neighbours. The proof of Theorem |3.1[ which gives 
a formula for the number of common neighbours of two given vertices v and w, 
is based on three explained in Section 3 and Figure [TJ The division 

into three cases is based on the trend, as shown in Theorem |2.2[ that spheres of 
influence tend to shrink over time. It can happen that spheres of influence that 
are disjoint become overlapping at a later time instance, and thus do not fit any 
of the three cases. However, this behaviour happens with low enough probability 
that it does not affect our result. 



Proof of Theorem 3J_. The proof depends heavily on Theorem 2^ Any precise 
reference to the theorem will therefore be omitted. We can assume that at time 
T, 

A2 

deg{vi,T) = (1 + 0(1))— w log n 

and the degree of this vertex is as predicted by Q until the end of the process (that 
is, the ratio between the upper and lower bound on the degree is deterministically 
equal to (1 + o(l))). Since k > I, the degree of Vk for the time interval after T is 
given by Q as well. Let r{v,t) denote the radius of the sphere of influence around 
V at time t; that is, r{v,t) = (|5(w, t)|/c„)^/™. 

Case 1: Suppose that d > €{u}logn/TY^"^ for some e > 0. For T < t < n, we 
can deduce from the expression for the degree of ve over time and the expression 
for the volume \S{ve,t)\ of the sphere of influence of ve that 

'A2a;logn(t/T)P^i^ 



r{ve,t) = {1 + o{l)) 



In particular, let us note that d is of greater or equal order as r{ve, T), and hence 
of greater or equal order as r{vk,T) as well. Moreover, both radii tend to be 
decreasing from time T on. (Formally what we mean is that r{ve, t) > r{v£, t^l+e)) 
for any e > and t > T. When a vertex receives a new neighbour, its radius 
slightly increases.) Therefore, there exists a constant c = c{e) > such that 
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S{vi,t) and S{vk,t) are disconnected for every t > cT and so there is no chance 
to create more common neighbours. Since at time cT the degree of vertex V£ is 
(1 + o{l)){A2/ Ai)d'^^ijj\ogn = 0{ijj\ogn), we can apply an obvious upper bound 
to get 

cn{vi,Vk,n) < deg{vi,n) =0{u\ogn). 

Finally, note that it can happen that cT > n, which means that the process stops 
before the spheres of influence become disjoint. This causes no problem since the 
upper bound for the number of common neighbours at time cT will then trivially 
hold at time n. 

Case 2: Suppose k > {1 + e)i for some e > and d satisfies inequality ([s]). 
Note that the condition for d implies that at time n the sphere of influence of V£ 
is contained in that of Vk- Moreover, the radii of influence are proportionally de- 
creasing during the process from the time we start having concentrated behaviour 
of degrees onwards (that is, from time T on, in the sense explained earlier). So the 
sphere of influence of vi is contained in the sphere of influence of Vk from time T 
to time (l + o(l))?2. Any vertex u that links to vi lies inside the sphere of influence 
of V{: and thus of Vk as well, and has a probability p of also linking to Vk- 

At the end of the process (for t = (1 + o(l))n) it can happen that the sphere 
of influence S{ve,t) is not completely contained in S{vk,t), but it is the case that 
they overlap to a large extend, namely 

Thus, the probability that a neighbour of vi, added during this phase of the 
process, is also a neighbour of Vk is (1 + o{l))p. 

Therefore, Kcn{ve, Vk, n) = (l + o(l))p£, since the number of common neighbours 
accumulated until time T is 0(ci;logn) and so is negligible. 

Suppose now that k = (1 + o{l))i and d ^ [k/nY^^. In this case, the radii 
of f £ and Vk are approximately equal from time T to the end of the process (that 
is, they differ by a multiplicative factor of (1 + o(l))). Since d is of order smaller 
than the radii at the end of the process, property ( [l2| holds for T < t < n and 
the results holds by the same argument as before. 

Case 3: Suppose > (1 + e)^ for some e > and d satisfies inequality ([6]). 
Note that the condition for d implies that at time T the sphere of influence of vi 
is contained in that of Vk-, but this is not the case at time n. 

Let t~ be the first moment when S{vi, t) is not completely contained in S{vk-, t) 
{T < t~ < n). Let be the last time when the spheres overlap {t~ < t~^). (Note 
that it is possible that > n but, as before, this causes no problem.) Up to time 
t~ , each neighbour of ve will be a common neighbour of ve and Vk with probability 
p. From time to n, no common neighbours can be created. From time t~ until 
time t^, the probability that a neighbour of ve becomes a neighbour of Vk is at 
most p. Thus, pdeg~{ve, t~) and pdeg~(f^, t'^) form a lower and an upper bound, 
respectively, on the expected number of common neighbours of v and w. 

Note that at time t~, S{vi,t~) is contained in S{vk,t~) and "touches" the 
boundary from the inside (the distance between the boundaries at time t~ may not 
be exactly zero but certainly is o{d)). At time t^, S{vi,t~^) is outside S{vk,t~) but 
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"touches" the boundary from the outside. Since the centers of S{v£, t) and S{vk, t) 
are at distance d from each other, this translates into the following expressions 
involving t~ and t~^: 

r{vk,t-) -r{ve,t-) = (1 + o(l))rf, 
r{vk,t+)+r{ve,t+) = (1 + o(l))c/. 

Using the concentration result about the in-degree, this translates into the fol- 
lowing conditions on t~ and 

A \l/m / / ' \ P-4i/m 



and so 



r = (i + o(i))(i^) ■ ^/^rf-^ (i 



t+ = (1 + 0(1)) — ^/-''^^d-^^ 1 + 




The number of common neighbours of Vk and is bounded from below by 
(1 + o(l))pdeg~(f^, t~), and from above by (1 + o(l))pdeg(ff, t"*"). Using our 
knowledge about the behaviour of the in-degree of f£, this leads to the following 
bounds, which hold within a (1 + o(l)) term: 



mpAi 

< Ecn(t>^,t>fc,n) < 
The result follows from the fact that 

Finally, consider the case where /c = (1 + o(l))^, and thus ikjii = 1 + o(l). As 
before, from time 



-'1 



t+ = (1 + o(l)) d-^^ 

until time n, the spheres are disjoint and there is no chance for a common neigh- 
bour. At time t such that T < t = o{t~^), the spheres overlap to a large extent 



and (12) holds. However, for e > and t such that et~^ < t < only a nontrivial 
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fraction of S{ve,t) is contained in S{vk,t). The above analysis still applies, but in 
this case instead of an asymptotic result, we obtain the order result stated in the 
theorem. 

Finally, let us note that the number of common neighbours is a sum of indepen- 
dent random indicator variables with Bernouilli distribution. The concentration 



follows from the bound (5.1). □ 



5.3. Edge length distribution. Finally, we give the proof of the theorem about 
the edge length distribution. Remember that a long edge is an edge such that its 
endpoints are at distance at least r^, where is chosen so that a ball of radius 
Ta has volume n~". As in the previous subsection, the proof distinguishes three 
cases, but now the three cases depend on whether the sphere of influence of a 
vertex has radius greater than (allowing the vertex to receive long edges) at 
the beginning and the end of its life. 

First, we need to recall a few known results: the behaviour of A^^ = Ni:{n), 
the number of vertices of in-degree k = k{n) at time n, the number of edges 
M = M{n) at time tt,, and the upper bound for the size of the influence regions. 
The following result was proven in . 

Theorem 5.3 (PJ). Suppose that pAi < 1. The following holds a.a.s. for every 
Q<k< (n/log^n)(P^i)/(^P^i+2). 

Nk = (l + o(l))cfcn, 

where Cq = 1/(1 + PA2) and for k > 1, 



Ck 



Moreover, 



Note that 



Ck 



p' TT J A, + A2 

1 + kpAi + pA2 1 + jpAi + pA2 ■ 

M= (1 + 0(1)) ^ n. 



pAi 



pAi T-rfc I A2 , ^ 



T{k+f^ 



pAi Y (k + l + ^ + / T (^ + ^ 



Suppose now that k = k{n) tends to infinity together with n. Using Stirling's as- 
ymptotic approximation of the Gamma function {V{z) = (1 -|- o{l))\/2Txz^~^/'^e~^) 
we can take Ck to be: 
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and the following useful corollary is proved. 

Corollary 5.4. Suppose thatpAi < 1. Letu = uj{n) he any function tending to in- 
finity withn. The following holds a.a.s. for every tu <k< (n/ \og^ n)^^^'^^^^^^^'^^'^^ 



Nk = {l + o{l))ck ^ f^in, 



where 



^ (13) 



pAi p f 42 



A, 



In P, it was proved that a.a.s. for all vertices we have that deg~ {vi,n) = 
0((log^ n)(n/'i)P^^), provided that was born at time i. Now, with Theorem 2.2 
in hand, we get a stronger result, namely that a.a.s. for all i < t < n 

deg~{vi,t) = O ^(wlogra) 0^ 

where u = u}{n) is any function tending to infinity together with n. (Indeed, 
for a contradiction suppose that deg-{vi,t) > {2uj log n){t/i)P^^ for some value 



of t. Theorem 2.2 implies that deg (fj,i) = (2 + o(l))a;logn which is clearly a 
contradiction.) This implies the following result. 

Theorem 5.5. Suppose that pAi < 1, and u is a function that goes to infinity 
together with n. The following holds a.a.s. for every vertex horn at time i. 

\S{vi,t)\ = O 



The results given above are used in the proof of Theorem 4.1 , which we are now 
ready to give. 



Proof of Theorem 4-1 Suppose first that a > 1. Since the sphere of influence of 
every vertex at every time of the process is (deterministically) at least ^2/71-^ 
n~°', "long" edges can occur at every step of the process. A vertex v will receive 
a short edge precisely when the new vertex falls within a ball of radius around 
V, and thus automatically falls within the sphere of influence of v, and then links 
to V. The probability that this happens equals pn~°'. Thus, the expected number 
of short edges pointing to a vertex born at time i is pn~°'{n — i), and the total 
number of short edges is (1 + o(l))pn^~"/2 = o{n) and so is negligible compared 
to the total number of edges. We conclude that a.a.s. almost all edges are long. 



and the result holds by Theorem |5.3 

pA 
ApAi 



Suppose now that 1 — . „ < a < 1. Let e^ia) be the number of long edges 



pointing to v, that is: 

ey{a) = {w G N~{v) : d{v,w) > Va} 

where N~{v) is the in-neighbourhood of vertex v. 

For a vertex v to receive an edge of length greater than Tq, at time t, its region 
of influence must have radius at least r^, and thus have volume |S'(f,t)| > n~". 
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Key to the proof is Theorem 2.2 and its conclusion that the regions of influence 
tend to be shrinking. 

Let u = u{n) be any function increasing with n. First, we only consider vertices 
whose final degree is at least u log n. This is enough to get a lower bound for the 
number of long edges. Later we will show that the contribution of the remaining 
edges is negligible. Consider a vertex v with final degree k = deg~{v, n) > u logn. 
It follows from Theorem 2.2 that a.a.s. for every vertex v of degree k > ulogn at 
time n, 

deg-{v,t) = {l + o{l))k{^-] 

for all tk < t < n, where 

tk = n ^ 

(Note that deg{v,tk) = (l + o(l))a;logn.) Therefore, we may assume, without loss 
of generality, that for all < t < n 

\S{v,t) \ = {1 + o{l))Aikn-P^HP'^'-\ 

We distinguish three possible classes of vertices, based on their final degree: 
vertices of high final degree can receive long edges from time tk until the end of 
the process, t = n (Case 1); vertices with final degree in a mid-range can receive 
long edges from time tk until some time tl, tk < t\ < n (Case 2); and vertices with 
small final degree can never receive long edges after time tk (Case 3). 

The cut-off values of the three cases are 



bj\ogn\ p^i 



A^min = (^^4~ J (wlogn 

and fcinax = Consider a vertex v of degree k. 

Case 1. Suppose that k > /^max- Note that this implies that 

\S{v, n)\ = {l + o{l))A,k/n > (1 + o(l))n-", 

so for any constant e > 0, and for any time t in the range tfc < t < (1 — e)n, 
the sphere of influence of v has radius greater than r^. This implies that v has 
an opportunity to receive long edges from time t^ until the end of the process, or 
very close to it. 

For tk < t < n, the probability that v receives a short edge (edge from a vertex 
within distance Va) equals pmin{?T,~", |S'(f,t)|} = {1 + o{l))pn~°' . Moreover, these 
events are independent. Thus, w.e.p. the number of short edges is 

(1 + o(l))pn-"(n - tk) = (1 + o(l))pni-°, 

where the last step uses the fact that tk = o{n) in this case. 
The degree of v at time tk is 0(a;logn), so we have that w.e.p. 

e^{a) = deg-(t;,n) - (l + o(l))pn^-" + 0(wlogn) = (l + o(l))(fc-pn^-") 



> (1- Ml + 0(1)) 



Note that if A; > a;n^~", then w.e.p. almost all edges pointing to v are long. 
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Case 2. Let £ > be some (arbitrarily small) constant. Suppose that (1 + 
£)kmm < < (1 — £)kmax- The upper bound on k implies that \S{v,n)\ < (1 — 
e + o(l))n~" so there is no chance for v to receive long edges near the end of the 
process. On the other hand, it follows from the lower bound on k that \S{v, tk)\ > 
{1 + e — o(l))n~" so if the new vertex at time tk falls within S{vt, k) , there is a 
positive probability that a long edge to v is created. 

Let 

tl = (AiA;n"-P^^)^ . 

Note that \S{v,tl)\ = (1 + o(l))r2~". Thus, the influence region of v has radius 
greater than from time tk to (1 — 5)t^, and radius less than from time (H-5)t^ 
to n, for some small 6 > 0. 

Thus, by a similar argument to the previous case, we obtain: 

ev{a) > (l + o(l)) J2 P {Aikn-P^HP^'-^ - n-") 

t=tk 

= {l-0{S)){kn-^^^{tir''^-p{tl)n-'^) 

= {l-0{6))Al-'^'k^^n i-f^i (l-pAi). 

Similarly, we get that < (1 + 0{5))Al'''^^ k^^^n ^'-p^^i (1 - pAi) and so 



3^{a) = {l + o{l))Al'''-^'k^-'^^^n ^-^^i (1-Mi) 



by taking 5 — > 0. 

Case 3. Finally, suppose that ulogn < k < {1 — e)kram for some e > 0. Since 
\S{v,tk)\ < {I — e + o(l))n~", the influence region has radius smaller than Tq, 
from time tk until the end of the process. Thus, for such vertices, all edges they 
receive in this time slot are short. Thus the only long edges v can receive are those 
received before tl, so e^ia) = 0(0; log n). Trivially, the same property holds for 
any vertex of degree smaller than u log n. 

In order to obtain upper and lower bounds on the total number of long edges, we 



can use Theorem 5.3 and its corollary (Corollary 5.4) to calculate the number of 



long edges pointing to vertices of final degree larger than kmm (Cases 1 and 2). Let 



cbe as defined in Equation (13), and let K = (n/log^n)(P^i)/^^P^i+^). By Corollary 
5.4, K is the upper bound on the values of k for which we have concentration for 
Nk- Note that the bounds on a imply that A;max ^ K, and thus J2k>k ~ 
(1 + o{l)) Ek,^..<k<K k-' for all 7 > 1. 
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The number of long edges to vertices of the first type (Case 1) is a.a.s. equal to 
E, = (1 + 0(1)) J2 Nkik-pn'-'^) 



(l + o(l)) (^ck'^'^n'^ {k-pn^-'') 
(1 + o(l)) I cn ^ k^p^-cpn^""" ^ k^^'^p^ 
(1 + o(l)) I cpn ^ 



c ^ — n'~^+"W(i - (pAAil-pAA). 
1 — 

The number of long edges to vertices of the second type (Case 2) is a.a.s. equal 

) 

^ / 1 1 \ / 1 -pAi(l-c) \ 

E2 = (1 + 0(1)) 22 ick~^~'^n] i Al-"""' k^^n ^-^^1 (1 - pAi) J 



k — 



= (l + o(l))c A^"^-^^(l-pAi)n^~^^^ ^ ~' 

(Technically, to get a lower bound of E2 we should sum over fcmin(l + ^ k < 
fcmax(l — ^) and sum over fcmin(l + £^) < ^ < ^max to get an upper bound. Since 
the error in this summation is {1 + 0{e)), the result holds by taking £ — )■ 0.) 

Since 1/2 < pAi < 1, the exponent of k in the summation is in the interval 
(—1,0), and thus the behaviour of the summation is determined by its upper 
bound /cmax- This leads to 

2pAi-l 

= (1 + o(l))c Ar^^ {l-pA,W-^^ ^ %^^_, 

(l-pAi)pAi 

^ ^ ^ l-pAi 2pAi - 1 ^ 

Since El and E2 are of the same order, we can take Ei + i?2 as a lower bound for 
e(a). 

In order to obtain an upper bound, we consider edges to vertices that are in 
Case 3, that is, those that have final degree at most fcmin- It follows from The- 
orem 5.5 that any vertex that is able to receive long edges directed to vertices 



with small final degree has to have a time of birth i < Vax = ^f^"^ log n. There 
are obviously at most Zmax of such vertices, and each of them has 0(a;logr;,) long 
edges. So the number of long edges that we did not count yet is at most: 



-E3 = 0[{uon°' logn)(co'logn)) 



n 



a+o(l) 
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Since is of smaller order than Ei + E2, the result follows. 



□ 



For the proof of Theorem 4.2, we use a large part of the previous proof. 



Proof of Theorem For this theorem, we consider the expected value of e{a). 
Thus, we can use the expected values of Nk, and do not need to consider the 
cut-off on the values of k for which the values of Nk are concentrated. In [1], it 
was shown that 



[l + o{l))ck ^ pA^n, 



for all k > k„ 



Suppose first that 1/2 < pAi < 1 and 1 — pAi < a < 1. Consider the proof of 
Theorem |4]T The three cases of this proof still hold as before; let k^i^i and /cmax 
be as defined in this proof. As explained in this proof, concentration for the values 
of Nk hold only up to degree K = 77,Mi/(4p^i+2)+o(i)_ ^jj-^jg affects the computation 
of El. However, in this proof we only consider the expected value of e(a), so by 
linearity of expectation, in the computation of Ei we can use the expected values 
of the Nk- This leads to the following expression for the expected number of long 
edges to vertices of the first type (Case 1) : 



E(Ei) = (1 + 0(1)) Yl ^(Nk){k 



pn 



) 



;i + o(l)) ^ [ck'^'^^n^{k-pn^-'') 



r)A ''^i 



(14) 



where c is as defined in Equation (13). 



For the computation of E2, we should note that we may not have concentration 
of Nk for the values of k close to /cmax- However, we can a similar calculation to 



that used in the proof of Theorem 4.1[ using the expected values of the A^^, to 

obtain that E{E2) = Q{n f^i f^i ). 

The argument that E^ is negligible compared to E,{Ei) and E(i?2), as laid out 



in the proof of Theorem 4.1, still holds here. Thus, we have that 



E(e(a)) = e{n^~^^"^i 



l-pA 



The result follows by taking the logarithm. 

Next, consider the case where 1/2 < pAi < 1 and a < 1 — pAi. It follows from 
Theorem 2.2, and it was also shown in [1], that w.e.p. the maximum in-degree 



in a graph produced by the SPA model is at most Km = 0{n^"^^\og'^ n). Since 



n 



I A\ ^ n^^, w.e.p. no vertices are in Case 1, so no vertices can receive 
long edges until the edge of the process. 

For the vertices that are in Case 2, we can apply the same calculation as 



in the proof of Theorem 4A, while taking the expected values of the A^^ as in 
the previous case. Since w.e.p. Km is an upper bound on the maximum de- 
gree, the expected number of vertices of degree greater than Km is at most 
nexp(— 0(log^ n)). Hence, the expected number of long edges to such vertices. 
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is at most ra^ exp(— 6(log^ n)) = o(l). The expected number of long edges to 
vertices of the second type (Case 2) therefore is equal to 

-pAi(l-a) 

1 n i-p-^i (1 — pAi\ 



E(E2) < (l + o(l)) ^ [ck'^'^^n] [Al-^^^k^ 

h—h ■ ^ 

= {1 + o{l))c Al'^-^' (1 - pA^)n^ i-J'^i ^ k~^^^ 



pAi(l-a) ^ ^ _ -I I 2pAi-l 



For a lower bound on E(£'2), we should sum over /Cmin(l + £) ^ k < nP^^ . 

Since pAi > 1/2, as before the exponent of k in the summation is determined 
by its upper bound Km = 0{nP^^ log'^n). This leads to 



pAi(i-c) ( Km) i-^-pM)pA 

nE2) < {l + o{l))cAl-^^^{l-pA,)n'-^^'^-^^^^ 

(l-pAi)pAi 



for some function g of order g(n) = Q{K m / n^"^^) = 0(log^ n). For the lower bound 
on E(i?2), "we have the same summation, but with upper bound n^^^ instead of 

Km, and thus E(£;2) = VL{n^=^). 

Since \og{g{n)) = o(logn), we can combine lower and upper bound to see that 

logE(E2) ^ pAia ^ 
logn 1 — pAi 

Finally, consider the vertices in Case 3. Here, the exact same argument as given 



in the proof of Theorem can be used to show that 

EiEs) = O ((wn" log logn)) = n' 



a+o{l) 



Since this is of smaller order than K{E2), the result follows. 

Finally, consider the case where pAi < 1/2. For a G (1 — pAi, 1) we have the 
exact same expression for E(£'i) as for the case where pAi > 1/2, as given in 
Equation (14). Thus 

^ 1 1— pA^ 

E(Ei) = ein'^i^''^^) = o(n"), 
where the last step follows since 

2 — + a = 1 - (1 - a) < a. 

pAi pAi V Ml ) 

For q; G (0, 1 — pAi) we have that w.e.p. Ei = 0, so E(i?i) = exp(— B(log^ n)). 
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For E2, we have the same sum as before: let K* = fcmax if « G (1 —pAi, 1), and 
K*, an (almost sure) upper bound on the maximum degree, otherwise. Then 

/ 1 1 \ / 1 -pAi(l-a) \ 

E{E2) = (1 + 0(1)) J2 ick^ ^^n]lAl-''^'k^^n i-^^i (l-pAi)J 

k=k, ■ ^ ^ 



K* 

pAi(l-a) _ ^ 1 I 2pAi-l 



h—h ■ 

Since pAi < 1/2, the exponent of k in the summation in this case is determined 
by its lower bound 



11 



kuAn^y—^j (a;logn) 
This leads to 

2pAi-l 

nE-,) < (1 + o(l))c Al-^'^ (l-pA)n"^^ ^ ™^^_i - = o(n«), 

(l-pAi)pAi 

2pAi-l 

where the last step follows since the exponent of {u log n) in (/Cmin) (i-^-^iJ^-^i equals 

^(l-pAi)pAi 

Finally, the same estimate as before can be used to show that E^ < n°''^°^^\ and 
thus log(E(e(«))/ logn < o; + o(l). 

For the lower bound, note that all volumes of influence up to time T = {Ai/2)n°' 
have (deterministically) volume at least 2n°'. Therefore, a positive fraction of all 
edges generated until time T arc long, and so a.a.s. fl(n°') is a lower bound for the 
number of long edges and the theorem is finished. □ 
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